Week 4: From Distributions to Sampling

1. Introduction

As we think about time-averaging and taphonomic bias in the sedimentary record, we can use R tools to visualize these concepts. We talked last week about basic univariate statistics like means, medians, and modes. We familiarized ourselves a bit more with apply(), table(), and summary() to derive insights into the distribution of our data. We also introduced variance and standard deviations.

This week, we develop our understanding of variance, deviation, and standard deviations a bit more and will learn how these descriptions of variance can help us work through some examples of how taphonomy can transform fossil communities, and how we might deal with it. We do this because the sedimentary record contains fossils that are biased by, first, biological production and, second, differential preservation. As we try to bridge ecological and paleoecoloical scales, we must translate fossil counts into landscape distributions, population estimates, or even past environmental conditions. We can understand biological production and sedimentation more clearly as sampling processes. This means that past biological populations can never be compared with our estimates and that we are always dealing with a biased sampling of past communities, to some degree.

2. Describing Differences in Observation Variables

Standard deviations help us characterize variation in either a sample or a population of things. It is derived from some related values that are worth knowing.

Methods for describing variance.

  • Mean absolute deviation - sum of all deviations from mean divided by number of observations. Function call = mad()
  • Variance - sum of the squared deviations from mean divided by the number of observations. Function call = var()
  • Sample Variance - identical to variance, but number of observations is corrected as (n -1). Functioncall = var()
mad(fake_core$Cyperaceae.undiff.)
## [1] 13.3434
mad_cyp = sum( abs(fake_core$Cyperaceae.undiff. - median(fake_core$Cyperaceae.undiff.) ) ) / length(fake_core$Cyperaceae.undiff.)

mad_cyp
## [1] 8.86

Above, we get some different results between R and a manual calculation because the mad() function uses a constant to adjust for “asmptotically normal consistency” (see help(mad)). Notice first that we can use mean or median to as a basis for absolute deviation for our vectors. This can be set with the “center =” and “constant =” arguments in mad() or within our manual calculation, we take the mean value of the absolute differences between the individual observations and their mean. This looks different from the formula above because I’ve subsumed the “sum()” and “/length(x)” functions by calling the entire calculation under “mean()”. If we calculate the value manually using the median (second half of code below), we get the same value as the “mad()” function, using a constant of 1.

mad(fake_core$Cyperaceae.undiff., center = mean(fake_core$Cyperaceae.undiff.), constant = 1)
## [1] 8.74
mad_cyp = mean(abs(fake_core$Cyperaceae.undiff. - mean(fake_core$Cyperaceae.undiff.)))

mad_cyp
## [1] 8.8652
#Now we calculate the median average deviation.

mad_cyp = median(abs(fake_core$Cyperaceae.undiff - median(fake_core$Cyperaceae.undiff.)))

mad_cyp
## [1] 9
mad(fake_core$Cyperaceae.undiff., constant = 1)
## [1] 9

Here we are working with deviation from the mean or median. This is a logical enough way to express deviation in a set of numbers, but its applications and interpretation are limited (ex: how easy is it to compare between variables?). However, it does set the foundation of how we calculate variance, which is the sum of the squared deviations from the mean divided by the number of observations. For samples, we calculate the latter as the number of observations less one (N - 1). Base R has a function for variance “var()”, but we can also calculate it manually.

var(fake_core$Cyperaceae.undiff.)
## [1] 105.164
var_cyp = mean((fake_core$Cyperaceae.undiff. - mean(fake_core$Cyperaceae.undiff.))^2)

#Perhaps R calculates variance for a sample and not a population...

var_cyp = sum((fake_core$Cyperaceae.undiff. - mean(fake_core$Cyperaceae.undiff.))^2)/
  (length(fake_core$Cyperaceae.undiff.)-1) #Note that we're taking N - 1 here. 

var(fake_core$Cyperaceae.undiff.)
## [1] 105.164
var_cyp
## [1] 105.164

In this case, we’ve finally gotten identical values when compared with the base R function. But what does variance mean and how do we interpret it? R lets us generate a lot of fake and specifically structured data. Let’s look at some general features of variance by looking at systematically modified input.

my_df = sapply(1:10, function(x){ #Using function to create progressively larger runs of numbers that are equally spaced.
    seq(x, x*100, x)
})

apply(my_df, 2, var)
##  [1]   841.6667  3366.6667  7575.0000 13466.6667 21041.6667 30300.0000
##  [7] 41241.6667 53866.6667 68175.0000 84166.6667
plot(apply(my_df, 2, var), type = "o", pch = 21, bg = "orange")

Naturally, as the range of values gets larger the variance goes up. While the range grows by 100s, the variance does not grow linearly. Let’s look at a longer range of numbers and see how variance and the overall spread of the data interact.

my_df = sapply(1:100, function(x){ #Using function to create progressively larger runs of numbers that are equally spaced.
    seq(x, x*100, x)
})

apply(my_df, 2, var)
##   [1]     841.6667    3366.6667    7575.0000   13466.6667   21041.6667
##   [6]   30300.0000   41241.6667   53866.6667   68175.0000   84166.6667
##  [11]  101841.6667  121200.0000  142241.6667  164966.6667  189375.0000
##  [16]  215466.6667  243241.6667  272700.0000  303841.6667  336666.6667
##  [21]  371175.0000  407366.6667  445241.6667  484800.0000  526041.6667
##  [26]  568966.6667  613575.0000  659866.6667  707841.6667  757500.0000
##  [31]  808841.6667  861866.6667  916575.0000  972966.6667 1031041.6667
##  [36] 1090800.0000 1152241.6667 1215366.6667 1280175.0000 1346666.6667
##  [41] 1414841.6667 1484700.0000 1556241.6667 1629466.6667 1704375.0000
##  [46] 1780966.6667 1859241.6667 1939200.0000 2020841.6667 2104166.6667
##  [51] 2189175.0000 2275866.6667 2364241.6667 2454300.0000 2546041.6667
##  [56] 2639466.6667 2734575.0000 2831366.6667 2929841.6667 3030000.0000
##  [61] 3131841.6667 3235366.6667 3340575.0000 3447466.6667 3556041.6667
##  [66] 3666300.0000 3778241.6667 3891866.6667 4007175.0000 4124166.6667
##  [71] 4242841.6667 4363200.0000 4485241.6667 4608966.6667 4734375.0000
##  [76] 4861466.6667 4990241.6667 5120700.0000 5252841.6667 5386666.6667
##  [81] 5522175.0000 5659366.6667 5798241.6667 5938800.0000 6081041.6667
##  [86] 6224966.6667 6370575.0000 6517866.6667 6666841.6667 6817500.0000
##  [91] 6969841.6667 7123866.6667 7279575.0000 7436966.6667 7596041.6667
##  [96] 7756800.0000 7919241.6667 8083366.6667 8249175.0000 8416666.6667
plot(apply(my_df, 2, var),
     type = "o",
     pch = 21, 
     bg = "orange")

lines(apply(my_df, 2, mad)*1000) #Here we're adding the MAD values for the same dataset, multiplied by 1000 to make them visible.

Here, we see that the relationship between variance and the overall dispersion becomes linear after a total dispersion of about 60,000, which gives us a variance of about 3 million. So, we see here that populations which are structurally identical but which vary in their total quantity are going to have different variances. Specifically, the larger the total counts, the more variance we expect to find. We can also see that the median average deviation overestimates variance up to about 40,000 then continually under-estimates it afterwards. We will explore more about variance and population structure (i.e. composite frequency distribuion of variables in population) once we have mastered some more univariate statistics.

3. Using Systematic Variation to Test Parameters

We can learn a bit more by creating longer lists of repeating numbers, which will show us some of the impacts of sample size. Below, we make a short list of numbers with a mean and median of 9, then we repeat that list of numbers 100 times, taking the variance and median absolute deviation. The code below gives us three plots: one showing variance/MAD across the 100 repetitions of the number string; one showing a histogram of this data, and another showing frequency distributions as displayed using a combination of “plot()” and “table()” methods.

test_range = c(5:13)

plot_var = sapply(1:100, function(x){
  var(rep(test_range, x))
})

plot_mad = sapply(1:100, function(x){
  mad(rep(test_range, x))
})

par(mfrow = c(1, 3))

plot(plot_var, ylim = c(0,10), pch = 21, bg = "orange")

lines(plot_mad)

hist(rep(test_range, 100))

plot(table(rep(test_range,100)))

par(mfrow = c(1, 1))

Regarding variance, we see a minor impact of the total sample size on our measurement of variance, but it quickly approaches an asymptote after being doubled three or four times. The distribution of this data is even and we see that increasing the number of observations (so long as they’re the same numbers) does not impact our estimation of the mean or median. Will this hold for a set of numbers where the frequency distribution isn’t even?

norm_range = #Using <shift+enter>, we can already make a visualization of the counts...
    c(5, 5,
      6, 6, 6,
      7, 7, 7, 7,
      8, 8, 8, 8, 8,
      9, 9, 9, 9, 9, 9,
      10, 10, 10, 10, 10,
      11, 11, 11, 11,
      12, 12, 12,
      13, 13)

plot_var = sapply(1:100, function(x){
  var(rep(norm_range, x))
})

plot_mad = sapply(1:100, function(x){
  mad(rep(norm_range, x))
})

par(mfrow = c(1, 2))

plot(plot_var, ylim = c(floor(min(plot_mad)) - 0.5, ceiling(max(plot_var)) +0.5), pch = 21, bg = "orange")

lines(plot_mad)

plot(table(rep(norm_range,100)))

par(mfrow = c(1, 1))

Let’s try this with a v-shaped distribution.

v_range = c(
      5, 5, 5, 5, 5, 5,
      6, 6, 6, 6, 6, 
      7, 7, 7, 7, 
      8, 8, 8,
      9, 9, 
      10, 10, 10, 
      11, 11, 11, 11,
      12, 12, 12, 12, 12,
      13, 13, 13, 13, 13, 13)

plot_var = sapply(1:100, function(x){
  var(rep(v_range, x))
})

plot_mad = sapply(1:100, function(x){
  mad(rep(v_range, x))
})

par(mfrow = c(1, 2))

plot(plot_var, ylim = c(floor(min(plot_mad)) - 0.5, ceiling(max(plot_var)) +0.5), pch = 21, bg = "orange")

lines(plot_mad)

plot(table(rep(v_range,100)))

par(mfrow = c(1, 1))

The replication structure we’re using here will serve us in the future. Let’s codify the repetitions and plotting into a function to save us all this copy-pasting.

stat_lines = function(x,#Creating a function called "stat_lines" it expects some vector of data "x" and the following arguments
                      times = length(x), #Setting this to "length(x)" by default, so calling this won't always be necessary. But we can add our own numbers!
                      title = NA){ #Setting up a title object, defaulting to NA
  
  plot_u = sapply(1:times, function(y){ #Here's that fast custom function that creates means for the repeated set of numbers.
    mean(rep(x, y))
  })
  
  plot_med = sapply(1:times, function(y){ #Same, but for the median.
    median(rep(x, y))
  })
  
  plot_var = sapply(1:times, function(y){ #Same, but for variance.
    var(rep(x, y))
  })
  
  plot_mad = sapply(1:times, function(y){ #Same, but for median absolute deviation
    mad(rep(x, y))
  })
  
  stat_labels = c("mean", "median", "variance", "median abs. dev.")
  
  plot_stats = data.frame(plot_u, plot_med, plot_var, plot_mad)
  
  plot(0, 0, xlim = c(0, times), ylim = c(0, max(plot_stats)+1), pch = NA, xlab = "N repeats", ylab = "value", main =paste0(title," Repeated Statistics"))
  
  for(i in 1:ncol(plot_stats)){
    points(plot_stats[, i], type = "o", pch = 21, lty = i, bg = i, cex = 0.7)
    text(times*0.05+(i*(0.2*times)), min(plot_stats[, i])+(0.04*max(plot_stats)), labels = stat_labels[i])
  }
  
}

stat_lines(v_range, title = "V-Shaped Range")

This makes plotting this much easier, allowing us to make some quick comparisons between our two small datasets, which we are making much larger using the “rep()” function.

par(mfrow = c(1,2))

stat_lines(v_range, title = "V-Shaped Range", times = 100)

stat_lines(norm_range, title = "Normal Range", times = 100)

par(mfrow = c(1,1))

We can apply this same approach to some of the fake core data.

par(mfrow = c(2,2))

stat_lines(fake_core$Cyperaceae.undiff., title = "Cyperaceae")
stat_lines(fake_core$Typha, title = "Typha")
stat_lines(fake_core$Alchornea, title = "Alchornea")
stat_lines(fake_core$Guibourtia.demeusei, title = "Guibourtia")

par(mfrow = c(1,1))

What all of this demonstrates is that:

  1. Variables with different frequency distributions may have the same mean and/or median value.
  2. Variance grows at an exponential rate as the total distribution of data increases.

4. At Last, Standard Deviations

Thus, to standardize our estimate of deviation, we take the square root of variance.

sd(fake_core$Cyperaceae.undiff.)
## [1] 10.25495
sd_cyp = sqrt(var(fake_core$Cyperaceae.undiff.))

sd_cyp
## [1] 10.25495

Let’s see how it performs when we systematically manipulate the data again.

par(mfrow = c(1, 2))

my_df = sapply(1:10, function(x){ #Using function to create progressively larger runs of numbers that are equally spaced.
    seq(x, x*100, x)
})

#apply(my_df, 2, sd)

plot(apply(my_df, 2, sd), type = "o", pch = 21, bg = "orange")
lines(apply(my_df, 2, mad))

my_df = sapply(1:1000, function(x){
  seq(x, x*100, x)
})

plot(apply(my_df, 2, sd), type = "o", pch = 21, bg = "orange")
lines(apply(my_df, 2, mad))

par(mfrow = c(1, 1))

Let’s add standard deviations to our basic plots and remove the variance.

stat_lines = function(x,#Creating a function called "stat_lines" it expects some vector of data "x" and the following arguments
                      times = length(x), #Setting this to "length(x)" by default, so calling this won't always be necessary. But we can add our own numbers!
                      title = NA){ #Setting up a title object, defaulting to NA
  
  plot_u = sapply(1:times, function(y){ #Here's that fast custom function that creates means for the repeated set of numbers.
    mean(rep(x, y))
  })
  
  plot_med = sapply(1:times, function(y){ #Same, but for the median.
    median(rep(x, y))
  })
  
  #plot_var = sapply(1:times, function(y){ #Same, but for variance. #We can keep our code and just hash this out.
  #  var(rep(x, y))
  #})
  
  plot_mad = sapply(1:times, function(y){ #Same, but for median absolute deviation
    mad(rep(x, y))
  })
  
  plot_sd = sapply(1:times, function(y){
    sd(rep(x, y))
  })
  
  stat_labels = c("mean", "median", "median abs. dev.", "SD") #Because we automate everything below, all we have to do is modify the names.
  
  plot_stats = data.frame(plot_u, plot_med, plot_mad, plot_sd) #We're using the plot_stats dataframe below, so we modify the entry here.
  
  plot(0, 0, xlim = c(0, times), ylim = c(0, max(plot_stats)+1), pch = NA, xlab = "N repeats", ylab = "value", main =paste0(title," Repeated Statistics"))
  
  for(i in 1:ncol(plot_stats)){
    points(plot_stats[, i], type = "o", pch = 21, lty = i, bg = i, cex = 0.7)
    text((times/ncol(plot_stats)*i), min(plot_stats[, i])+(0.04*max(plot_stats)), labels = stat_labels[i]) #Changing text plotting to fit new variable.
  }
  
}

Let’s plot our selected taxa again.

par(mfrow = c(2,2))

stat_lines(fake_core$Cyperaceae.undiff., title = "Cyperaceae")
stat_lines(fake_core$Typha, title = "Typha")
stat_lines(fake_core$Alchornea, title = "Alchornea")
stat_lines(fake_core$Guibourtia.demeusei, title = "Guibourtia")

par(mfrow = c(1,1))

At last, we see that the SD is slightly more conservative than median absolute deviations. Because we’ve taken variance, which is sensitive to total dispersion, and standardized it by differences from the mean we end up with a value that is easier to interpret. This is why most of us understand some basic principles of standard deviations without knowing why this value has these characteristics. Thus, standard deviations are the typical distance from the mean that characterizes a set of values.

5. Back to Frequency Distributions, Sampling Basics

Armed with our newfound knowledge of variance, we can make good use of some powerful R functions:

  • rnorm() - generates a set of numbers based on a normal distribution.
  • sample() - randomly samples from a given input.

We can also verify this with some population thinking using the function “rnorm()” which lets us create a normally-distributed sample using arguments for the number of random numbers, mean, and standard deviation.

norm_vector = rnorm(n = 100, mean = 50, sd = 10)

hist(norm_vector)

Let’s see how this distribution changes when we systematically increase the sample size.

par(mfrow = c(3,2)) #We're making six plots

sapply(1:6, function(x){ #Using a custom function six times
  hist(rnorm(n = 10^x, #Call histogram for a normally distributed set of random numbers, increasing by orders of magnitude (10^i)
             mean = 50, #Set the mean of the random values
             sd = 10), #Set the standard deviation
       breaks = 50, #Breaks for histogram
       main = paste0("Histogram for n = ", 10^x)) #Title for plots
})

##          [,1]                                 
## breaks   Numeric,57                           
## counts   Integer,56                           
## density  Numeric,56                           
## mids     Numeric,56                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE                                 
##          [,2]                                 
## breaks   Integer,49                           
## counts   Integer,48                           
## density  Numeric,48                           
## mids     Numeric,48                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE                                 
##          [,3]                                 
## breaks   Integer,63                           
## counts   Integer,62                           
## density  Numeric,62                           
## mids     Numeric,62                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE                                 
##          [,4]                                 
## breaks   Integer,39                           
## counts   Integer,38                           
## density  Numeric,38                           
## mids     Numeric,38                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE                                 
##          [,5]                                 
## breaks   Integer,46                           
## counts   Integer,45                           
## density  Numeric,45                           
## mids     Numeric,45                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE                                 
##          [,6]                                 
## breaks   Numeric,49                           
## counts   Integer,48                           
## density  Numeric,48                           
## mids     Numeric,48                           
## xname    "rnorm(n = 10^x, mean = 50, sd = 10)"
## equidist TRUE
par(mfrow = c(1,1))

R lets us create data using rbeta(), which generates some skewed data. There’s two arguments for shape (“shape1” and “shape2”). As the two numbers get closer together, the distribution becomes more normal.

par(mfrow = c(3,2))

sapply(exp(seq(0.2, 1.2, 0.2)), function(x){ #Using a custom function six times
  hist(rbeta(10000, #Call histogram for a normally distributed set of random numbers, increasing by orders of magnitude (10^i)
             x, #Setting shape values
             10), #Setting shape values
       breaks = 50, #Breaks for histogram
       main = paste0("Histogram for Skewed Data")) #Title for plots
})

##          [,1]                  [,2]                  [,3]                 
## breaks   Numeric,69            Numeric,68            Numeric,68           
## counts   Integer,68            Integer,67            Integer,67           
## density  Numeric,68            Numeric,67            Numeric,67           
## mids     Numeric,68            Numeric,67            Numeric,67           
## xname    "rbeta(10000, x, 10)" "rbeta(10000, x, 10)" "rbeta(10000, x, 10)"
## equidist TRUE                  TRUE                  TRUE                 
##          [,4]                  [,5]                  [,6]                 
## breaks   Numeric,37            Numeric,37            Numeric,39           
## counts   Integer,36            Integer,36            Integer,38           
## density  Numeric,36            Numeric,36            Numeric,38           
## mids     Numeric,36            Numeric,36            Numeric,38           
## xname    "rbeta(10000, x, 10)" "rbeta(10000, x, 10)" "rbeta(10000, x, 10)"
## equidist TRUE                  TRUE                  TRUE
par(mfrow = c(1,1))

We modify the shape arguments to create skews in the other direction.

par(mfrow = c(3,2))

sapply(exp(seq(0.2, 1.2, 0.2)), function(x){ #Using a custom function six times
  hist(rbeta(10000, #Call histogram for random numbers drawn from beta distribution of 10000 numbers
             10, #Setting shape values
             x), #Setting shape values
       breaks = 50, #Breaks for histogram
       main = paste0("Histogram for Skewed Data")) #Title for plots
})

##          [,1]                  [,2]                  [,3]                 
## breaks   Numeric,61            Numeric,39            Numeric,70           
## counts   Integer,60            Integer,38            Integer,69           
## density  Numeric,60            Numeric,38            Numeric,69           
## mids     Numeric,60            Numeric,38            Numeric,69           
## xname    "rbeta(10000, 10, x)" "rbeta(10000, 10, x)" "rbeta(10000, 10, x)"
## equidist TRUE                  TRUE                  TRUE                 
##          [,4]                  [,5]                  [,6]                 
## breaks   Numeric,68            Numeric,39            Numeric,38           
## counts   Integer,67            Integer,38            Integer,37           
## density  Numeric,67            Numeric,38            Numeric,37           
## mids     Numeric,67            Numeric,38            Numeric,37           
## xname    "rbeta(10000, 10, x)" "rbeta(10000, 10, x)" "rbeta(10000, 10, x)"
## equidist TRUE                  TRUE                  TRUE
par(mfrow = c(1,1))

If we shrink our stable shape value to less than one, we approach exponential distributions and we don’t approach normalcy as the values become even.

par(mfrow = c(3,2))

sapply(1:6, function(x){ #Using a custom function six times
  hist(rbeta(10000, #Call histogram for random numbers drawn from beta distribution of 10000 numbers
             0.2, #Setting shape values
             x), #Setting shape values
       breaks = 50, #Breaks for histogram
       main = paste0("Histogram for Skewed Data")) #Title for plots
})

##          [,1]                   [,2]                   [,3]                  
## breaks   Numeric,51             Numeric,50             Numeric,47            
## counts   Integer,50             Integer,49             Integer,46            
## density  Numeric,50             Numeric,49             Numeric,46            
## mids     Numeric,50             Numeric,49             Numeric,46            
## xname    "rbeta(10000, 0.2, x)" "rbeta(10000, 0.2, x)" "rbeta(10000, 0.2, x)"
## equidist TRUE                   TRUE                   TRUE                  
##          [,4]                   [,5]                   [,6]                  
## breaks   Numeric,40             Numeric,38             Numeric,40            
## counts   Integer,39             Integer,37             Integer,39            
## density  Numeric,39             Numeric,37             Numeric,39            
## mids     Numeric,39             Numeric,37             Numeric,39            
## xname    "rbeta(10000, 0.2, x)" "rbeta(10000, 0.2, x)" "rbeta(10000, 0.2, x)"
## equidist TRUE                   TRUE                   TRUE
par(mfrow = c(1,1))

Being able to manipulate distributions in this way can help us add specific parameters for some of our expectations for how production and preservation may bias our estimates of past community composition and rates of past ecological change. Let’s look at our basic statistics again, using skewed distributions. We will build four vectors with 1000 random values drawn from four different distributions.

my_log <- rlogis(1000)
my_exp <- rexp(1000)
my_nrm <- rnorm(1000)
my_skw <- rbeta(1000, 10, 2)

my_dist <- data.frame(my_log, my_exp, my_nrm, my_skw)

my_hist <- function(x, data){
  hist(data[,x], breaks = 50, main = colnames(data)[x])
}

invisible(lapply(1:ncol(my_dist), my_hist, data = my_dist))

boxplot(my_dist, horizontal = TRUE, las = 1)

We encounter a problem with the above - the ranges of the numbers are rather different, and the “rlogis()” function is giving us an exponential relationship between the mean and the min/max, but it is still more or less normally distributed.

my_df = sapply(1:1000, function(x){
  rnorm(100, mean = x*10, sd = 10)
})

apply(my_df, 2, var)
##    [1]  91.91677 113.33116  90.87983 105.54010  89.65999 126.64008  92.66892
##    [8] 116.21103  96.14015  77.37445 102.21696 106.94374 100.09283  97.92641
##   [15] 112.80092  93.85903 115.91272  88.29658 114.52904  97.21918 109.31468
##   [22]  82.40590  85.39635  90.08322 116.05195 101.36645 102.31383 112.52049
##   [29]  85.11235  62.69520 100.28855  98.90478  98.87331  92.93104  96.11734
##   [36]  83.87380 102.99480 103.22118 102.00706 124.48385 106.12914 118.02170
##   [43]  98.98906  98.31032  92.68570  58.07111 105.09322 100.55060 108.04310
##   [50] 116.17138  97.98678 109.49027  93.37430  99.22040  91.10845 114.42871
##   [57]  97.31955  95.35936  86.69679  74.40185  99.96501 112.26177  85.51171
##   [64] 116.36703 115.84678 106.31050  97.87560 117.68964 102.18453 102.30189
##   [71]  83.52007  85.67294  77.97993  88.36761 111.38392 121.10285  92.52087
##   [78]  92.58047 135.59310 103.10893  88.59419 108.98279 105.61653  91.15076
##   [85]  99.64295 102.29959 104.89414 100.73256 110.13131 107.62796 110.21360
##   [92]  80.45781  89.99015  75.60344 119.03857  87.42713  74.03848  99.04111
##   [99] 101.74934  82.18198  68.85868  86.61543  90.91115 107.29842 113.42334
##  [106]  96.49120 115.72708  90.98320 101.46560  93.46258 111.21175 110.01535
##  [113]  90.97217  91.25447 122.39403  87.98576 115.03348  93.07930  93.23830
##  [120]  90.63780 111.93911 108.12084  94.45952  71.02356  97.76443  96.51735
##  [127]  90.78035  73.53062 106.22247  84.34674 104.07433  89.63699 127.96867
##  [134] 100.12469  93.23444  90.07238 118.55309  93.15491 107.73563  96.09155
##  [141] 107.63396  88.17151 101.68566  83.33000  96.55262  97.14835  93.25641
##  [148]  79.67622  83.61698  86.83761  84.64529  90.99176  89.00304 102.09829
##  [155] 110.56458  88.65657  80.60105  93.15203 117.59661  75.98655  98.41306
##  [162]  76.80687 109.48840  98.41305  80.08308  96.93528 103.35120  94.49921
##  [169]  99.18178  80.63358  86.52644  98.14949 101.46245  96.93244 120.48913
##  [176]  94.20548  78.14552  75.25783  90.29916 114.97480  89.91578 104.23132
##  [183] 107.91173 118.11974 102.79236  89.53108  96.28781 149.89658 102.89619
##  [190] 108.75251 110.19926 157.69473  85.02257  81.83225  73.71113 103.80227
##  [197] 101.78947  93.54037 106.72553  90.79949 118.99382  80.16633 101.68532
##  [204]  86.40862  85.36634  78.60793  98.49513 101.28619 114.01310  96.97242
##  [211]  92.65145  92.77225  78.84692 123.69773  84.60589  88.44627 107.06678
##  [218] 121.25711  77.75759 133.48235 117.19192  95.17851  98.00784  80.62145
##  [225]  70.48558  94.44980  90.70240 117.90323  93.55612  94.34625 100.73534
##  [232] 104.74090 108.55430  83.75399  99.69598  74.41647  98.08401 119.78429
##  [239] 114.20353  93.75780  95.57012  94.73474 116.95312 113.43784  87.39458
##  [246] 102.95173  91.86855 127.46075  94.80465 111.92046  98.64431 103.20610
##  [253] 112.78447  88.98835  92.89926  87.37615  87.37727  92.73474 100.58589
##  [260]  97.61627 106.41565  90.12683  83.64311  99.98275 119.26403  99.24104
##  [267]  74.69380 114.44538  76.70604  95.41430  72.42700  92.49313  71.24710
##  [274] 100.65515 107.52312  92.48951 110.08069 122.30210  92.57239  85.95866
##  [281]  98.18966  84.32650 106.43415  87.21512  82.72021  95.34278  86.77822
##  [288]  85.83783 116.38972  99.28426  93.83675 118.27549 110.11913  97.68804
##  [295] 114.23996  89.34120  96.24621 105.09426 113.73694  97.94223 108.28720
##  [302]  91.31161  90.40705 115.66519 101.76123  88.27380  87.75419  93.83341
##  [309]  93.13289 106.02890  85.95308 134.80789 131.86157  92.09905  99.12375
##  [316]  95.39860 110.30395 104.98861  88.23982 101.01743 104.28297  93.77874
##  [323]  94.14459 105.35033 115.65281  99.59377  82.81441 105.52633  96.67919
##  [330] 108.88161  77.16142  78.54104  79.22505  94.37643 102.11847 111.32246
##  [337]  94.36270  96.97566  85.24018 104.78906  97.94331  98.47267 112.12429
##  [344]  92.61750 110.88489 101.03955  97.93342  92.17250  85.71711 117.37061
##  [351]  99.72105 132.46391  94.70294  95.14506 107.49577 104.27257  90.68824
##  [358]  88.62163 104.65549 119.50163 106.40543  84.73837 109.17413  84.15237
##  [365]  95.54936  98.05938 132.53158  68.82904  87.66846 118.05574  91.71375
##  [372]  67.96826 107.41460  97.87373 105.31923  80.95961  97.33772  80.36529
##  [379] 112.38546  86.59786 105.00567  97.36861  82.49782  96.07043 118.35056
##  [386]  93.94728 101.71911 106.20520  81.08899  92.59505 100.64291 112.08866
##  [393]  93.45028 109.75790  72.82524  91.24777  90.25758  73.97354 103.47414
##  [400]  77.61856  84.03092 107.75255 113.78484 104.62652 117.15071 114.80919
##  [407] 137.98082 110.59706  94.91962 102.23369  86.85678 109.81449 101.11100
##  [414]  89.46308  91.97314 123.89582  86.76664  97.28726 105.39787  89.25729
##  [421]  90.32135 111.45630  86.90132 133.27983  98.23244  85.31537 100.19737
##  [428] 105.90232  86.66599 115.89353  78.59628  75.02612  94.49608 118.90503
##  [435]  79.52160  71.37334  87.51811 102.53263 114.96237 110.70180  95.25843
##  [442] 129.12940  82.55699 100.46562 121.82810 104.24324  99.90609 109.26093
##  [449] 109.41585  94.41526  84.71457 108.14125  94.26464 111.59519 126.75281
##  [456] 106.90196 102.07434  93.98290  93.38972  91.88619 113.91266  92.84520
##  [463] 106.51241 103.19175  71.67833 105.17094 130.63475 111.22524  95.52833
##  [470]  94.12342  99.57447 100.85598  86.21634 131.49171 120.62526 117.80634
##  [477]  87.28620 106.69107 113.44171 111.30775  88.27667  82.99422  92.72751
##  [484] 103.19152 106.25205  95.93964  80.93012 130.20918 102.27484  96.26937
##  [491] 142.20984  88.98333  82.95141 109.32908 106.85959 116.98210 102.74353
##  [498]  90.12450  85.70101  96.12652 109.84851 103.31080 108.29702 103.55035
##  [505]  89.82988 117.67039  89.71113 141.30428 102.46201 119.91545 103.01418
##  [512]  98.05325 103.16767  95.48293 116.96211  97.73587  90.45874  90.61735
##  [519]  83.81840  97.52111  87.56180  92.65242  89.23375  79.53466  72.96877
##  [526]  76.49870 111.71806  93.36781  86.60616 102.66398 102.90071  87.75327
##  [533]  89.89439 110.32903 101.57247 101.25437  95.08865  97.99124  93.04677
##  [540] 105.99430 101.88489  73.21543 106.70097  79.96514  96.34395 105.94350
##  [547] 111.42355 107.99868 100.44958  86.29837  77.90278  84.18710  91.26691
##  [554] 104.44486  98.80524  92.07447 103.77079  95.29369  83.07129  85.24898
##  [561]  98.91295 103.10857 106.42158  92.04459  99.93997  94.16863  92.91681
##  [568]  92.79367 104.05057  81.27089 103.00006 110.25204  86.97453 140.14333
##  [575] 110.09522 149.55295  99.18712 117.77018 113.87575  86.22295  84.53492
##  [582] 140.09223  80.23610  97.65483 108.10349 110.56878 109.72795  91.21321
##  [589] 108.26938  88.82868  97.58058 124.49451 103.33306 102.93928 103.68627
##  [596] 111.27140  92.47717  71.54999  68.30266 122.89580  84.98147  95.08537
##  [603] 106.39000  90.52691 124.05887 100.45574  99.36465  80.10139  82.82059
##  [610] 101.19983  87.16898  92.66655  92.07438 101.80824 102.52573 116.85667
##  [617]  96.81863  80.17619 125.31161 124.11204  91.26298 102.42924 102.77127
##  [624] 123.33079 119.38507  93.82931 100.68158 127.83000  95.29952 101.71570
##  [631] 108.99551 110.18170 108.29682 109.97555  80.13238 114.48197 106.23598
##  [638]  99.30560 109.23232 107.86591  95.99627  66.90164 107.15087 117.56764
##  [645]  95.87239  99.58499 102.24754  93.91136 112.61973 112.84287 100.28661
##  [652] 131.12363  99.14775 119.88972  93.52245 105.85057  71.26657  99.34411
##  [659]  94.00553  91.62109  90.81493 116.73781 107.61680 134.51954 118.33894
##  [666] 109.17801  93.37972  80.78963 101.13201  88.33514  98.43215  90.92413
##  [673] 102.91250 103.94530 104.89239 109.51490 102.45605  99.59301  88.75336
##  [680] 111.78730 107.36964 104.10790 108.16049  97.15058 106.47718  81.61077
##  [687]  93.94158 110.28502  92.77437 105.59221  81.78614  83.09844 111.78681
##  [694] 103.44578  87.05692  97.58173 106.20597 102.81502 100.14797 129.29359
##  [701] 102.78535 134.52996 103.23803  89.23380  99.30458 115.49521 119.08743
##  [708]  87.41736 106.19902 114.96686  93.63540  82.63304  96.38022  89.16272
##  [715] 116.08902 106.88286 111.36420 106.29197 110.79614 108.96047 126.34226
##  [722]  90.11516 131.63751  98.79598  86.11180  86.69136 102.70482  88.46339
##  [729] 114.33808  94.86442  99.40901 103.99556  88.04640 115.04543 110.62126
##  [736]  75.91159  99.01641  78.12995  76.55909 118.42646  98.79479 101.52769
##  [743] 111.21974  90.17961 102.09604  99.18030 118.45488  96.33281  97.30289
##  [750] 107.81429 118.53218 121.62456 107.33683  97.95464 108.09599  94.86954
##  [757] 127.84911  86.65321  92.89539 109.99302  87.71453 100.32872 120.77091
##  [764] 106.21770  88.67180 105.59745 123.91861 121.76138 109.45543  87.29483
##  [771] 140.30911  96.62310  86.29664  86.32731  88.90527 117.99440 113.08237
##  [778] 102.02438  98.81798 120.12142 102.55900  90.18104 103.66985  95.42713
##  [785]  76.19061  90.12125  90.08163 101.48788 119.05045  77.94110  80.77187
##  [792]  96.90511  87.07511 103.86231  74.68653 112.60492 111.70109 112.13339
##  [799]  97.34086  97.16948 102.66192  90.51406 105.19919  96.49342  96.85392
##  [806] 105.80258  99.02701 108.01814  84.74296 103.87043  95.37539 128.73740
##  [813]  97.20202  86.53338  86.48398 106.55412  98.04686  83.51767  88.09206
##  [820] 100.21649 117.97879 103.49657  98.88599  90.78284 107.51189 113.85407
##  [827]  90.62667  89.00409 117.65644  81.91938  82.13353  89.76232 110.26529
##  [834]  98.95940  98.78905  98.32357  90.45355  90.95602 106.25293  91.57631
##  [841]  92.95336 113.84132 110.81600 107.20375 112.21967 114.56182  89.50202
##  [848] 103.27642  96.91910  83.90832 138.27276  92.48370  87.51808 134.36444
##  [855]  86.96423  88.06919  94.39880 109.74516 102.93710 112.95168  93.53859
##  [862]  92.22776  98.74434  74.47272  96.31675  82.66148 121.00624 103.25672
##  [869]  82.02852 126.65983  81.33713 110.05960 103.63864  90.30336  99.99469
##  [876]  85.76242 106.30419 104.27716  89.28885 105.12334  86.31607  94.21505
##  [883]  85.55419  96.66701  93.54439  90.09010 133.93919 116.32783  89.79148
##  [890] 104.65716 115.50019  95.57168  95.74042  95.81791 107.13785 108.98565
##  [897]  96.09153 110.09904 109.93073 102.13408 104.35366 125.75633 108.00095
##  [904] 119.69912 106.21025  95.85990 123.69424 108.91040  84.72641  90.40928
##  [911] 116.79424  74.68891  95.31744 112.78691 110.84211 113.10626 130.91303
##  [918]  80.15320  90.60672  74.43497 135.48271  98.42966 106.68668 109.71477
##  [925] 103.92461  92.64505  93.59235 104.08821 103.51648  91.75358  96.92633
##  [932] 124.48836  98.15869  74.31321  81.41446  91.18166  96.67804 111.74737
##  [939] 132.47768 104.89508 107.72754 113.34083  95.84822 107.03621 117.11172
##  [946]  92.02580  87.39153 107.47717  83.33146 103.50323 102.55630  97.25066
##  [953]  88.45252 111.81069  97.55445  93.70770  92.01069 102.26630 114.31766
##  [960]  80.01533  87.27592  85.46819  88.91451  95.29368 100.79865  74.84715
##  [967]  95.53612 114.98630 122.74582 104.13728  83.97891 111.10306  92.35974
##  [974] 115.75341 100.77157  97.98733 106.67828  77.47949 114.42733  93.87777
##  [981] 108.09538  96.48020  90.27027 107.26490 115.97458 136.82657  91.44925
##  [988] 118.01160 102.06270  92.12152  83.16737  85.89726  95.53821  90.76898
##  [995]  92.22294 110.95742  88.45593 117.11183  93.40461 101.63439
plot(apply(my_df, 2, var), type = "o", pch = 21, bg = "orange")